-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preview dataset #1288
Preview dataset #1288
Conversation
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
…taset Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
@@ -20,6 +20,9 @@ | |||
|
|||
logger = logging.getLogger(__name__) | |||
|
|||
PREVIEW_DATASETS = ["pandas.csv_dataset.CSVDataSet", | |||
"pandas.parquet_dataset.ParquetDataSet", "pandas.excel_dataset.ExcelDataSet"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting Spark and launch would be a really nice stretch goal, but not a dealbreaker
if self.type in PREVIEW_DATASETS: | ||
# If the kedro-datasets is on the latest and does have the _preview | ||
if (hasattr(dataset, '_preview')): | ||
self.preview = dataset._preview(40) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our telemetry, hardly anyone uses this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting. I would argue this is a slightly more compelling thing for users to actually change, but it's also something we can wait to see if users start asking for :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now done @datajoely 😄 I just included this today, by default this feature is always on unless user chooses to change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it toggled or can they pass the preview number of rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is just toggled on and off at the moment, with the number of rows sets to 40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hugely excited about this :)
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @Huongg! ⭐
I have many comments, but most importantly I think you should test three things before you make any changes:
- what happens if you change your code in
_preview
in kedro-datasets to raise an exception (easiest way is to just put in the code1/0
? - what happens if you run it on a dataset that's
pandas.CSVDataSet
but the file doesn't exist? - what happens if you run it on a dataset that's `pandas.CSVDataSet but with some data missing?
I suspect the first two of these will result in no metadata panel loading for the dataset and that the 3rd test case will work.
Assuming I'm right here, you should then:
- Write a test that fails test case 2
- Make the changes I suggest
- Test the cases that failed before and hopefully they pass now
I have more ideas for where we should go with this feature, but I'll post it on a separate issue. I think we're also going to see cases which aren't currently handled well e.g. if the rows have labels as well as columns, but I'm happy to release this as an MVP for now.
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work 🎉
Just to double check, what now happens in these three cases?
- change your code in
_preview
in kedro-datasets to raise an exception (easiest way is to just put in the code1/0
)? - a dataset that's
pandas.CSVDataSet
but the file doesn't exist? - a dataset that's
pandas.CSVDataSet
but with some data missing?
@@ -13,6 +13,7 @@ Please follow the established format: | |||
- Remove metrics plots from metadata panel and add link to the plots on Experiment tracking. (#1268) | |||
- Link plot and JSON dataset names from experiment tracking to the flowchart. (#1165) | |||
- Bump minimum version of React from 16.8.6 to 17.0.2. (#1282) | |||
- Show preview of data in metadata panel. (#1288) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth putting this at the top of the list since to users it's much more interesting than the other changes 😀 (maybe the react version point should go under bug fixes and other changes?)
Maybe also worth saying "preview of pandas.CSVDataSet
and pandas.ExcelDataSet
" too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey i like the idea of including the "preview of pandas.CSVDataSet and pandas.ExcelDataSet" here too. Even though it will also mention it in our Release highlight in the UI, I guess no harm to mention here again.
I think the React version might be a major change actually but maybe @tynandebold can confirm this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The React version change will make this a major release, so probably ok to leave it where it is.
Signed-off-by: huongg <[email protected]>
hey @AntonyMilneQB thank you. So to confirm:
Are these what you expected? |
Signed-off-by: huongg <[email protected]>
Yes, all as I expected thank you! 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing!!!!!! Thanks Huong!
Description
Fixes for #907. For now, every time the user clicks on CSV, Excel or Parquet Dataset in viz, it will load the first 40 rows in the metadata panel
Design:
Link to the journey and behaviour description
Link to the metadata side panel
Development notes
To test this locally, ensure you clone and pull the latest changes from this branch
https://github.com/kedro-org/kedro-plugins/tree/preview-csv-dataset
Then cd to location of
kedro-plugins/kedro-datasets
, then runpip install -e .
To check, runpip list
to see ifkedro-dataset
andkedro-viz
are pointed to your local machineOnce done, go back to
kedro-viz/demo-project
runkedro run
thenkedro viz
. It should show you all the changes from both repos.WIP: tests to be added to cover both changes, here and in kedro-plugins
Screen.Recording.2023-03-16.at.11.02.58.mov
QA notes
Checklist
RELEASE.md
file